Sequence context-specific profiles for homology searching: Supplementary Information

نویسنده

  • A. Biegert
چکیده

Generation of context profile library N = 1 million training profiles of length l = 2d+1 were generated as described in the main text and Figure 2. Each training profile is represented by a count profile cn(j, x), which specifies the counts of amino acid x ∈ {1, . . . , 20} at position j ∈ {−d, . . . , d}. These counts are obtained by multiplying the sequence profile tn(j, x) by the effective number of sequences Nn(j) at position j in the alignment from which training profile tn(j, x) was calculated: cn(j, x) = Nn(j)tn(j, x) (see next section for details). Here, we describe how these N profiles are clustered in order to obtain a set of K context profiles which recur frequently among the training profiles and which together can describe all training profiles. More precisely, we seek to determine context profiles p = (p1, . . . , pK) and their prior probabilities α = (α1, . . . , αK) that maximize the likelihoodP (c|p, α) that the training profile counts c = (c1, . . . , cN ) were generated by the context profiles. We model the distribution of counts cn(j, x) in each column j by a multinomial distribution. Since cn(j, x) can be real-valued, however, we replace the factorials in the multinomial distribution by Gamma functions (n! = Γ(n + 1)). The probability for context profile pk to have emitted counts cn(j, x) (j ∈ {−d, . . . , d}, x ∈ {1, . . . , 20}) is

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sequence context-specific profiles for homology searching.

Sequence alignment and database searching are essential tools in biology because a protein's function can often be inferred from homologous proteins. Standard sequence comparison methods use substitution matrices to find the alignment with the best sum of similarity scores between aligned residues. These similarity scores do not take the local sequence context into account. Here, we present an ...

متن کامل

Improving protein fold recognition with hybrid profiles combining sequence and structure evolution

MOTIVATION Template-based modeling, the most successful approach for predicting protein 3D structure, often requires detecting distant evolutionary relationships between the target sequence and proteins of known structure. Developed for this purpose, fold recognition methods use elaborate strategies to exploit evolutionary information, mainly by encoding amino acid sequence into profiles. Since...

متن کامل

Protein threading using context-specific alignment potential

MOTIVATION Template-based modeling, including homology modeling and protein threading, is the most reliable method for protein 3D structure prediction. However, alignment errors and template selection are still the main bottleneck for current template-base modeling methods, especially when proteins under consideration are distantly related. RESULTS We present a novel context-specific alignmen...

متن کامل

The global trace graph, a novel paradigm for searching protein sequence databases

MOTIVATION Propagating functional annotations to sequence-similar, presumably homologous proteins lies at the heart of the bioinformatics industry. Correct propagation is crucially dependent on the accurate identification of subtle sequence motifs that are conserved in evolution. The evolutionary signal can be difficult to detect because functional sites may consist of non-contiguous residues w...

متن کامل

Accelerating Information Retrieval from Profile Hidden Markov Model Databases

Profile Hidden Markov Model (Profile-HMM) is an efficient statistical approach to represent protein families. Currently, several databases maintain valuable protein sequence information as profile-HMMs. There is an increasing interest to improve the efficiency of searching Profile-HMM databases to detect sequence-profile or profile-profile homology. However, most efforts to enhance searching ef...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009